Multi-modal deep learning methods for classification of chest diseases using different medical imaging and cough sounds

Chest disease refers to a wide range of conditions affecting the lungs, such as COVID-19, lung cancer (LC), consolidation lung (COL), and many more. When diagnosing chest disorders medical professionals may be thrown off by the overlapping symptoms (such as fever, cough, sore throat, etc.). Additionally, researchers and medical professionals make use of chest X-rays (CXR), cough sounds, and computed tomography (CT) scans to diagnose chest disorders. The present study aims to classify the nine different conditions of chest disorders, including COVID-19, LC, COL, atelectasis (ATE), tuberculosis (TB), pneumothorax (PNEUTH), edema (EDE), pneumonia (PNEU). Thus, we suggested four novel convolutional neural network (CNN) models that train distinct image-level representations for nine different chest disease classifications by extracting features from images. Furthermore, the proposed CNN employed several new approaches such as a max-pooling layer, batch normalization layers (BANL), dropout, rank-based average pooling (RBAP), and multiple-way data generation (MWDG). The scalogram method is utilized to transform the sounds of coughing into a visual representation. Before beginning to train the model that has been developed, the SMOTE approach is used to calibrate the CXR and CT scans as well as the cough sound images (CSI) of nine different chest disorders. The CXR, CT scan, and CSI used for training and evaluating the proposed model come from 24 publicly available benchmark chest illness datasets. The classification performance of the proposed model is compared with that of seven baseline models, namely Vgg-19, ResNet-101, ResNet-50, DenseNet-121, EfficientNetB0, DenseNet-201, and Inception-V3, in addition to state-of-the-art (SOTA) classifiers. The effectiveness of the proposed model is further demonstrated by the results of the ablation experiments. The proposed model was successful in achieving an accuracy of 99.01%, making it superior to both the baseline models and the SOTA classifiers. As a result, the proposed approach is capable of offering significant support to radiologists and other medical professionals.


Introduction
The epidemic of COVID-19 is continuing to have an impact on public health services.As of the 10th of February in 2023, the World Health Organization (WHO) states that there has been a total of 753,479,439 confirmed cases of COVID-19 and 6,812,798 deaths over the globe [1].It is nevertheless very necessary to identify potentially infectious patients, differentiate them from other respiratory disorders, and establish appropriate isolation and treatment procedures, notwithstanding the overall decline in the number of newly reported cases [2].Procedures for the detection of illnesses and the monitoring of their course are very important in healthcare institutions.A reverse transcription-polymerase chain reaction (RT-PCR) analysis is the "Gold Standard" test for evaluating whether or not a patient has a COVID-19 infection [3].Even though RT-PCR is a viable diagnostic tool, it necessitates the employment of highly trained personnel to collect nasopharyngeal swabs and the utilization of a specialized laboratory to conduct the study [4].The findings might take a few hours or a few days to conclude due to the large number of infected individuals, and the significant variation in the frequency of false-negative results is something that has not been fully addressed [3,4].Even though some medical facilities, especially those located in less developed countries, do not have complete access to RT-PCR, patient assessment and management systems are essential [5].
Every year, 7% of the population throughout the world is diagnosed with pneumonia (PNEU), which has the potential to be lethal [6].PNEU is a potentially dangerous infection that may have severe consequences in a short amount of time due to the persistent flow of fluid into the lungs, which can lead to drowning.As a result, PNEU is considered to be a condition that can cause death.In PNEU, bacteria, germs, and other pathogens are responsible for the inflammation of the alveoli, which is a part of the lung sacs [7].As the number of pathogens in the lungs increases, the white blood cells in the body start to fight back against the bacteria and fungi by causing sores to appear in the air sacs [8].As a result, a portion of the air sacs in the lungs gets filled with fluid that is polluted due to PNEU, which results in issues with breathing as well as tussis and fever [9].A person can pass away from this potentially fatal PNEU infection if they do not get treatment with the prescribed drugs at an earlier stage [10,11].Lung cancer (LC) is the most lethal form of the disease and the leading cause of cancerrelated deaths worldwide [12].Although the prevalence of smoking is continuing its downward trend in the vast majority of developed countries [13], there is still a sizeable portion of the population that is at an increased risk for developing lung cancer.
Patients who are infected with COVID-19 often exhibit symptoms including fever, cough, loss of taste and/or smell, sore throat, chest discomfort, and shortness of breath [14].Those who are infected with PNEU, pneumothorax (PNEUTH) [15], LC, or tuberculosis (TB) [16] are the ones who are most likely to encounter the symptoms that are being described.COVID-19, along with other chest ailments like TB, PNEU, PNEUTH, etc., may be difficult for medical professionals to diagnose.Researchers and medical experts are now putting forth a lot of effort to come up with a dependable approach to diagnosing these chest conditions.They decided to employ imaging analysis with chest X-rays (CXR) and computed tomography (CT) scans to diagnose COVID-19 and other chest-related disorders.Chest imaging abnormalities that are unique to the SARS-CoV-2 infection may be seen in patients who have this virus.CXR and CT scans are the most common diagnostic tools for multiple chest diseases such as COVID-19 [15], LC [16], atelectasis (ATE) [17], consolidation lung (COL) [18], TB [19], PNEUTH [20], edema (EDE) [21], pneumonia (PNEU) [22] in indicative patients.Additionally, several studies [23][24][25] also utilized cough sounds to detect COVID-19 and PNEU.These assays have seen widespread use as an integral component of preliminary screening, particularly in circumstances in which the patient has significant respiratory symptoms [26,27].Because we do not know how the illness will develop over the next few years, we must identify and monitor chest problems, one of which is COVID-19.This is the case even though there are now new lungaggressive kinds.
CXR machines are the imaging method that is most often recommended to individuals who are experiencing respiratory symptoms [28].It is especially useful for detecting severe chest disease instances as mentioned above, given that patients who are in the intermediate or early stage of the disease may not present any symptoms when they are examined [29].It is a basic, speedy, and risk-free approach to doing the evaluation.Different chest diseases can be diagnosed using CXR, and algorithms based on artificial intelligence (AI) might be used to help in this process [30].Combining the results of many CXRs taken at different angles, as is done during a CT scan of the chest, a more complete image of the lungs may be obtained.CXR exams had a lower success rate in the early stages of COVID-19 and other chest diseases such as LC, PNEU, and TB sickness identification in comparison to CT scans, which had a greater success rate in the early stages of COVID-19 and other chest diseases [31][32][33].They have been used in the diagnosis of the ailment as well as the tracking of its progression [31].According to the findings of a study [32], more than seventy percent of patients with RT-PCR-confirmed cases of COVID-19 have ground-glass opacities, vascular enlargement, bilateral abnormalities, lower lobe involvement, and posterior inclination on their chest CT scans.According to studies [33][34][35], patients with COVID-19 have ground-glass opacities in the disease's early stages, and they have lung consolidation in their later stages.After some time, one notices that the form has become rounder, and the pulmonary distribution has moved toward the periphery.In addition to SARS-CoV-1 and MERS-CoV [36], there have been many additional coronavirus infections that have been related to abnormalities of the same kind.It is challenging for medical experts to identify chest diseases such as COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, and PNEU.Therefore, an automated and accurate tool is required to classify these chest diseases.
However, numerous studies [37][38][39][40] used cough sounds for the identification of several chest diseases such as COVID-19, tuberculosis, etc. Kavuran et al. [37] conceived of a study that makes use of the DCNN model in conjunction with the continuous wavelet transform (CWT), and scalogram approaches were used for the depiction of COVID-19 anomalies.In addition, following the training and validation of the model that was presented, the feature vectors that were stored in the fc1000 layer of the network were drawn and provided as input to the SVM classifier.They were able to obtain a specificity of 88.2% while maintaining a sensitivity of 96.5%.Another study [38] designed a novel model DCDD_Net used for the classification of several chest diseases by using cough sound images, CT scans, and CXR.They achieved the appropriate accuracy of 98.9% for the classification of chest diseases.Additionally, the studies [39,40] used cough sound images for the classification of COVID-19 and other chest diseases such as pneumonia, tuberculosis, and lung cancer.
The classification of diseases has been altered as a result of deep learning (DL) models, which have opened up new doors for medical professionals [35][36][37][38][39][40][41][42][43][44].Chest infections [45], the detection of cancer cells [46], the segmentation and identification of brain and breast tumors [47], and gene analysis [48,49] have been significantly improved as a result of medical systems partnering with convolutional neural networks (CNN).In this study, we propose a novel CNN-based model for the classification of normal and eight different chest diseases i.e., COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, and PNEU using CXR, CT scans, and cough sounds.In the model that has been proposed, we have substituted rank-based average pooling (RBAP) for the conventional max-pooling layer (MPL).Additionally, a batch normalization layer (BANL) has been included to solve the internal covariant shift (ICS), and the multipleway data generation (MWDG) technique has been implemented.In addition to this method, a scalogram is used to transform the coughing sounds into a visual representation.With the utilization of CXRs, CT scans, and cough sound images (CSI), the objective of this study is to consistently categorize nine unique chest diseases.This will assist medical professionals in recognizing abnormal or aberrant patterns that are brought about by the aforementioned ailments.According to our knowledge, this is the primary study to propose a single CNN model for classifying a group of chest disorders based on CXR, CT scans, and CSI.We believe that the findings of our research reduce the requirement for the attending physician to use several classification techniques for each chest condition.In addition, the proposed model is evaluated against seven well-known pre-trained classifiers, namely Vgg-19 [50], ResNet-101, ResNet-50 [50], DenseNet-121 [51], EfficientNetB0 [16], DenseNet-201 [12], and Inception-V3 [52], based on a variety of performance assessment criteria.
The primary contributions of the present work are discussed below: 3. The suggested model is trained using a scalogram technique that visualizes the coughing sounds.
4. The class imbalance issues have been resolved by using the synthetic minority oversampling technique (SMOTE) Tomek method.
5. An exhaustive comparison of the proposed model has been carried out between state-ofthe-art classifiers and seven baseline classifiers, namely Vgg-19, ResNet-101, ResNet-50, DenseNet-121, EfficientNetB0, DenseNet-201, and Inception-V3, in terms of performance evaluation measures.The results show that our proposed model has been proven to be superior to other cutting-edge models.
6.The ablation experiments have been performed to evaluate the effectiveness of the proposed model.
7. The Grad-CAM heat-map technique has been used so that the visual qualities of the many different ways in which chest illness diseases have been categorized can be highlighted.
8. Using chest X-rays (CXRs), computed tomography (CT) scans, and coughs as the major diagnostic tool, we developed a unique framework for identifying individuals sick with several chest diseases.
This study is divided into different sections: Section 2 presents the most recent research that has been conducted in the field of DL to classify a variety of chest ailments by the use of CXR scans, CT scans, and CSI.The materials used and the procedures followed in the study are outlined in Section 3. Section 4 begins with a presentation of the extensive experimental data, and then moves on to a discussion of those results.Section 5 provides a conclusion of the findings as well as recommendations for future work.

Literature review
In the year 2020, COVID-19 was recognized as a pandemic over the entire world.At the beginning of the same year, a range of computer-assisted diagnostic processes were created to predict the spread of the sickness using digital CXR and CT images.These procedures were all based on artificial intelligence (AI), deep learning (DL), and transfer learning (TL) models.Also, a large number of distinct AI models were used to identify COVID-19 based on cough sounds.Table 1 presents the most current research done in this field focusing on the diagnosis of COVID-19 and other chest-related diseases via the use of a variety of medical imaging methods as well as cough sounds.

DL model for classification of COVID-19 using different medical imaging
This section contains the most current research that has been published on DL models.These models were used in the classification of several chest diseases by using a wide range of medical imaging modalities.Nishio et al. [53] developed a CNN-based EfficientNet model for CXR classification by making use of three benchmark datasets that are freely accessible to the public.In the process of classifying COVID-19 images, the model that they proposed attained an accuracy of 95.12 percent, successfully differentiating pneumonia images from normal images.Malik et al. [24] built a CNN model termed BDCNet.The Vgg-19 algorithm was used to create this model.During this particular procedure, CXR was used for the classification of COVID-19, lung cancer, and pneumonia.The BDCNet was successful 97.10 percent of the time in correctly classifying these illnesses into the categories to which they belong.
A CNN classification model for COVID-19 in pneumonia (including viral and bacterial infected CXR) was developed by Venkataramana et al. [54].In addition, by using their method, they were able to distinguish TB patients from CXRs that were contaminated with pneumonia.The accuracy of diagnosis for bacterial, viral, and COVID-19 infections reached 88% after the training program was finished.Both TB and pneumonia were correctly identified with significant accuracy when using the classification method that was provided.The researchers Abdul Gafoor et al. [55] developed a simple CNN model that makes use of CXR to differentiate between people who are infected with COVID-19 and those who are not infected with COVID-19.After being validated, their model was shown to be correct 94 percent of the time.A CNN (LSTM) model was built to distinguish COVID-19 from influenza in the research that was referenced in [56].The researchers were able to attain a classification accuracy of 98.0% with the help of this model.[58] differentiated between COVID-19 instances and cases without COVID-19 by using a variety of pre-trained classifiers, including Vgg-16, MobileNet, DenseNet-121, Xception, and NasNet.These classifiers were used to identify COVID-19 cases.According to the findings, the accuracy of the Vgg-16 performs much better when compared to that of other pre-trained models.Regarding its degree of precision, the Vgg-16 has a success rate of 97.68 percent.While attempting to extract the features of COVID-19-infected patients from CT scan pictures, Oğuz et al. [59] used a variety of DL models.The ResNet-50, the Vgg-19, the SqueezeNet, and the Xception models were among them.These properties were input into machine learning (ML) classifiers including SVM, DT, and Naive Bayes so that the COVID-19 test set could be evaluated.Both ResNet-50 and SVM were able to achieve a classification accuracy of 98.21%, which is a substantial increase in comparison to their earlier performance.To locate COVID-19 in CXR, Sekeroglu et al. [60] made use of the CNN model and the dataset that was accessible at the time of their research.They were successful in identifying COVID-19 in minimal quantities of data and skewed CXR pictures by using CNN without preprocessing and minimizing the number of network layers.This allowed them to achieve an accuracy rate of 98.50 percent.
Using CT scans, Zhao et al. [61] developed an innovative DL model for the diagnosis of COVID-19.The fact that the DL method achieved an accuracy rate of 98% provides some indication of the degree to which it was successful.A DL-based chest radiograph categorization (DL-CRC) framework was developed by Sakib et al. [62] to correctly classify COVID-19 patients into two categories: abnormal and normal.The DL model that they used had a success rate of 93.94 percent of the time.This model was created using the DARI approach and generic data augmentation serving as its two primary foundational pillars in the construction process.Combining CNN and TL-based methods with VGG-16 was the strategy that Taresh et al. [63] used to develop their model for recognizing COVID-19 in CXR pictures.The MobileNet model had the greatest level of accuracy, 98.28 percent when it was analyzed using Vgg-16 as its point of departure.Using CXR images and a variety of distinct CNN models, Ahmad et al. [64] were able to successfully build a DL model to recognize COVID-19.This collection included a wide variety of different models, some of which include MobileNet, Inception-V3, and ResNet-50.It has been established that the InceptionV3 model is superior since it has an accuracy rating of 95.75 percent and an F1 score of 91.4 percent.
When developing a CNN model for the detection of COVID-19, Ravi et al. [65] relied on the CT and CXR datasets as their primary sources of information for their research.Chowdhury et al. [66] developed a model that was able to diagnose COVID-19 pneumonia based on CXR images.The pre-trained DL technique was utilized in the construction of the model.When developing a database for their work, they made use of findings from past research that had been conducted.They had a 97 percent accuracy rate when it came to classifying a wide variety of subjects into the appropriate categories.For classifying CT pictures associated with COVID-19, Mei et al. [67] used a convolutional neural network (CNN) with a support vector machine (SVM).It was possible to identify COVID-19 with improved precision once the recently built model architecture was applied to CT scans.They calculated that their model had an area under the curve (AUC) of 0.92.For COVID-19 identification, Hosny et al. [68] developed a hybrid model.In addition to CXR pictures, this model utilized two separate types of CT scans.Throughout their investigation, they blended a few different kinds of photographs to save time processing them and space in their storage device.In comparison to earlier methods that were analogous, they developed a method for performing CXR and CT scans that had an accuracy of 93.2% and 95.3%, respectively.Malik et al. [30] proposed a novel CDC_Net used for the classification of COVID-19, LC, pneumothorax, TB, and pneumonia from chest X-ray images.The CDC_Net model was designed by incorporating residual network thoughts and dilated convolution, and they achieved significant classification accuracy in classifying these diseases.
The researchers were motivated to propose TL as a strategy for recognizing COVID-19 after using X-ray and CT-scan pictures in a study [69].This occurred as a result of the fact that in cases of COVID-19, early screening by CXR has the potential to give helpful information for the identification of individuals who could be infected with COVID-19.The authors of the study [70] investigated how successful CT scans and CXR photos are in detecting COVID-19 using CNN by using CT scans and CXR photographs.They were accurate to the extent of 98.5%, which allowed them to accomplish their goal.COVID-19 was distinguished from the pneumonia virus as a separate pathogen by using the DenseNet-121 network that was developed by Harmon et al. [43].Many datasets were used to determine how well the classification had been performed.The innovative method attained an accuracy rate of 90.80 percent when it came to classifying COVID-19 from CT pictures that were polluted with pneumonia.Bhandary et al. [71] modified the AlexNet model by changing the topology of the last layer with SVM.They named the resulting model modified AlexNet (MAN).This was done to ensure that the models were as accurate as was humanly practicable.The authors investigated how well this innovative design performed in terms of COVID-19 diagnosis.In addition, CT images were used via the proposed network, which led to the diagnosis of lung cancer.A level of accuracy of 97.27 percent was achieved by the use of the suggested MAN.To differentiate COVID-19 from other chest ailments, the research [34] creates an innovative DMFL_Net model for medical diagnostic picture processing.The DMFL_Net model collects data from a variety of hospitals, creates the model with the assistance of the DenseNet-169, and provides accurate forecasts by making use of information that is kept confidential and is only disclosed to parties that have been granted permission to access it.In-depth tests with CXR were carried out, and the results showed that the proposed model not only achieves an accuracy of 92.45% but also manages to successfully maintain the confidentiality of the data for a wide range of clients.Topff et al. [72] developed a novel CNN [73] model for the classification of COVID-19, and they achieved remarkable outcomes in terms of a sensitivity of 0.87 and a specificity of 0.94.Lande et al. [74] designed a DL model for the Omicron [75] variant of COVID-19 topic modeling.They extracted data from Twitter and achieved an accuracy of 90.0%.
Alshazly et al. [76] proposed a model based on a transfer learning approach for the classification of COVID-19 cases using CT scans.They used two public datasets namely, the SARS-CoV-2 CT-scan and the COVID-19-CT, and achieved the F1-score of 92.90%.The study [77] developed two novel DCNN models, CovidResNet and CovidDenseNet, to diagnose COVID-19 based on CT images.The proposed model achieves a classification accuracy of 93.87%.The study [78] ensemble DL model with the Internet of Things (IoT) for screening of COVID-19 suspected cases and yielded 98.98% accuracy.Hamza et al. [79] proposed a CNN-LSTM and improved max value features optimization framework for COVID-19 classification and attained the remarkable outcomes of 93.4%.Additionally, the work [80] proposed a model based on two transfer learning models, namely, EfficientNet-B0 and MobileNet-V2, which were fine-tuned according to the target classes and then trained by employing Bayesian optimization (BO).Their proposed model yielded a classification accuracy of 98.8%.

DL model for diagnosis of chest diseases using cough sounds
This section describes the work that was carried out to identify COVID-19 from cough sounds via a variety of DL approaches.Using a variety of machine learning (ML) approaches that are generated from cough audio signals, Hemdan et al. [81] provide a hybrid architecture that they refer to as CR-19 for promptly detecting and diagnosing COVID-19.They do this by exploiting cough audio signals.This architecture is designed to accomplish the aforementioned goal.The use of ML techniques and the genetic algorithm has resulted in a significant increase in the accuracy of this framework.The degree of precision that their CR-19 framework has is 92.19 percent.In the study [82], a total of six distinct classifiers that had been trained in advance were used to classify the COVID-19 cough sounds.One of these classifiers was Nas-Net-Mobile, while others were GoogleNet, ResNet-18, ResNet-50, MobileNet-V2, and ResNet-101.In the beginning, they used the spectrogram method to convert the sound data into a visual representation.After that, these models that had been pre-trained were applied to the sound to extract its features and identify it as either COVID-19 or non-COVID-19, depending on which group it was a part of.Based on the information that was gathered, the ResNet-18 is superior to other classifiers since it has an accuracy rate of 94.90 percent.
Nessiem et al. [83] classified the loud breathing and coughing that the COVID-19 patients were experiencing with the use of CNN.They decided to use the CNN method, which involves listening for coughing and breathing, to determine whether or not a patient is infected with COVID-19.The standard technique serves as a benchmark for comparison with this novel approach, which excels virtually incomparably more in terms of its breadth as well as its application.By using the information that is currently available, a DL model may perform better than a CNN model in terms of accuracy 80.7 percent of the time.Chowdhury et al. [84] recommend using ensemble-based multi-criteria decision-making (MCDM) as a method for selecting the most efficient ML algorithms for COVID-19 cough classification.The validity of the presented strategy was able to be established by analyzing the data from four distinct cough datasets, namely Cambridge, Coswara, Virufy, and NoCoCoDa.Assessing the cough sample's acoustic properties is the first phase in the proposed technique for determining whether a cough sample contains COVID-19.The Extra-Trees classifier seems to have yielded very encouraging results (AUC: 0.95 and Recall: 0.97), based on what can be gleaned from the data.Classifiers developed by Hee et al. [85] allow for the differentiation between children who have asthma and those who do not have the condition.We acquired cough samples from 1192 asthmatic patients and 1240 healthy youngsters.The audio was utilized in the process of developing, among other aspects, the MFCC, among other qualities.It was essential to deploy a Gaussian Mixture Model-Universal Context Model (GMM-UCM) before it was possible to construct the ML implementation strategy that was ultimately chosen.The overall sensitivity of ML classifiers is 82.81 percent, while their specificity is 84.76 percent.
According to the findings of a study [86], it is essential to analyze either words or visuals to determine whether or not COVID-19 is present.There have been three trials conducted that make use of models that are based on voice and picture, which are also referred to as speech and image.A success rate of more than 98% was achieved when LSTM was used to precisely identify the patient's cough, voice, and respiratory patterns.CNN models such as VGG-16, DenseNet-201, ResNet-50, Inception-V3, InceptionResNet-V2, and Xception were utilized in the second phase of testing for the categorization of CXR pictures.The accuracy of the Vgg-16 model, which is superior to all other CNN models, is 85.25 percent when fine-tuning processes are not used, but it increases to 89.54 percent when these methods are used.The Coswara dataset was used by Aly et al. [87], which includes nine separate audio categories that users have recorded and classed according to their COVID-19 status.This includes a slicing cough that produces mucus, as well as regular breathing and speaking patterns.The CNN model was better able to accurately identify COVID-19 cough as a consequence of its training on a vast number of audio samples.According to the findings of their research, binary classifiers have the potential to achieve an AUC of 0.964% and an accuracy of 96%.Using the methodology presented in [88] sounds that do not belong to the COVID-19 family may be discriminated against from COVID-19 sounds.For training and evaluation, they used a total of 50 groups, with each group including 3,597 noises that were unrelated to coughing and 1,838 coughs.According to the findings of the study, the DL-based multiclass classifier has an accuracy level that is 92.64% overall.Using Mel-frequency cepstral coefficients (MFCC), Bansal et al. [89] developed a CNN model to recognize COVID-19 audio.Two methods that depend on learning might be implemented more rapidly with the assistance of the Vgg-16 architecture.It was established that the diagnostic tool had an accuracy of 70.58 percent and a sensitivity of 81% as a direct consequence of the model's use of a high-quality discovery approach.
Many research [8-13, 22-24, 26, 29-34, 41, 43] have found that the symptoms of many chest disorders, such as COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, and PNEU, are comparable to one another.Using CXR and CT scans presented a difficult diagnostic obstacle for medical practitioners, as it was difficult to categorize and identify the many chest ailments.Similarly, researchers [81,100,101] have sought to diagnose various chest ailments by listening to the patient cough.On the other hand, coughing sounds are similar among different disorders.As a result, there was an obvious requirement to design an automated framework based on DL models that could automatically identify chest ailments utilizing CXR, CT scans, and cough sounds.Previous research [43,91,93,[95][96][97][98][99][102][103][104][105] had the primary objective of differentiating COVID-19 instances from non-COVID cases by using CXR images and CT scans as diagnostic tools.There have been a few works [34,24,53,94,90,92] that have recognized the use of CXR pictures to identify COVID-19 from pneumonia diseases such as viral and bacterial infections as well as TB.However, limited work [81,100,101] has shown evidence to support the diagnosis of PNEU and COVID-19 based on cough sounds.On the other hand, DL models have not produced any evidence to support the diagnosis of LC, ATE, COL, TB, and EDE based on cough sounds.This research study therefore provides a DL framework that will detect different chest diseases based on CXR images, CT scans, and CSI.This is done to overcome the limitations that were discussed earlier.

Materials and methods
The goal of this study is to develop a CNN that is superior to the one that is currently considered the state of the art.Some of the improvements that will be included in this CNN are BANL, dropout, RBAP, and MWDG.The purpose of using CNN is to obtain the particular image-level representation (IIR).A total of 24 datasets that are available to the public were utilized throughout the training process of this suggested model.For better training, we have fixed the size of the CXR, CT scan, and CSI datasets images to 299 x 299 pixels.The experiment was carried out for a maximum of 50 epochs, and a batch size of 32.After running through all of the epochs, the suggested model achieved the required and appropriate level of accuracy in its training and validation.The multiclassification confusion matrix was utilized to test the classification performance of the proposed model in comparison to that of seven separate baseline classifiers.

Datasets description
There are two more subsections below this one.The first section contains the multiple chest disease databases of CXR and CT scan images.The second part is devoted to chest disease cough datasets.

Chest diseases CXR and CT scan image datasets.
To train and validate the DL models utilizing CXR, a total of 11 datasets on various chest disorders that are publicly available were collected from a wide variety of different sources.Through a GitHub repository that had been established by Cohen et al. [106], we were able to get 930 CXR that were infected with COVID-19 at the beginning of our research.This repository was able to gather CXR images from a broad number of hospitals and other public sources.Patients who tested positive for the COVID-19 infection were, on average, approximately 55 years old.Nevertheless, the whole set of metadata information is not going to be offered in this study.A total of 43071 COVID-19-positive CXR were collected using the SIRM database [107], the TCIA [108], radiopaedia.org[109], Mendeley [110,111], and the source on GitHub [112].The database of pneumonia images was retrieved from the RSNA [113].There are a total of 5216 CXR in this data set; 1349 are assessed to be within the normal range, while the remaining 3867 show pneumonia.The CXR pictures included in the lung cancer data set were retrieved from [113,114].There are around 5,000 CXR in this data collection.The CXRs of healthy persons were taken from the Kaggle archive [115].A total of 3205 CT images of pneumothorax were collected from the publically available database SIIM-ACR pneumothorax [116].A total of 18,663 CXR were obtained from the NIH [117], which included 6331 images of edema, 5789 images of atelectasis, and 6543 of consolidation lung.In the end, a total of 700 CXR pictures that were infected with TB were gathered [118][119][120].
For training and validating the proposed DL model, a total of 8 publicly available databases are used.The first dataset [121] consists of people who have had COVID-19 infections verified by chest CT scans that were performed without contrast enhancement.Hypertension, diabetes, and either pneumonia or emphysema were found to be the most common co-occurring conditions, as revealed by the patients' medical histories.Emphysema and pneumonia were also shown to be rather common.Patients who received a positive RT-PCR test result for COVID-19 and accompanying clinical symptoms were photographed inside an inpatient environment between March 2020 and January 2021.Patients were not given intravenous contrast during the CT exams, which were performed on a NeuViz 16-slice CT scanner in the "Helical" mode.There are a total of 35,635 CT scan photographs within a dataset, including 9,367 CT scans of patients who are regarded to be normal.The CC-19 [122] dataset is comprised of 34,006 CT scan slices, all of which were voluntarily given by 89 individuals attending three separate universities.The CT scan contained a total of 28,395 slices, and 28,395 of those slices belonged to individuals who had a positive COVID-19 test result.The information, which is comprised of CT scan slices for 89 unique individuals, has been scanned in its whole by three distinct scanners (such as Brilliance ICT, Samatom definition Edge, and Brilliance 16P CT).Among the total of 89 patients that were investigated, there was evidence that the COVID-19 virus was present in 68 of them.The remaining 21 people showed no indications that they had COVID-19 in their systems at any point during the investigation.A total of 3000 LC CT scan images are collected from the publically available dataset provided in ref [24].We collected a total of 412 CT scan images of pneumonia-infected lungs from [123].Using the open-source dataset supplied in ref [124], we extract a total of 1700 TB CT scan pictures.A total of 944 normal CT scan images were collected from [125].We collected the CT scan images of various chest conditions such as EDE, ATE, COL, and PNEUTH from [126,127] [128].A COVID-19 diagnostic tool will be developed as part of the Coswara project.This tool was based on sounds that are produced by the respiratory system, coughing, and speaking [129].The participants were requested to submit recordings of themselves coughing into a web-based data collection tool that could be accessed via a mobile device.The audio data that was obtained includes both shallow and deep coughing, quick and slow breathing, quick and slow phonation of vowels, and spoken digits.In addition, information on the patient's age, gender, geographic region, current health state, and prior medical conditions is recorded.The frequency used to record audio was 44.1 KHz, and all continents, except Africa, were included in the sample set.The COVID-19 cough sounds were also collected from the Sarcos dataset [130].A total of 44 cough sounds were collected from this database, of which 18 are COVID-19 cough sounds and 26 cough sounds of healthy persons.The TB-infected patient's cough sounds are collected [131].The data collection included coughs from 16 TB patients and 35 non-TB patients, with the majority of participants being men aged 38 on average.A total of 402 TB cough sound was collected from 16 patients.Two research teams from Portugal and Greece constructed the Respiratory Sound Database [132].It contains 920 annotated recordings ranging in duration from 10 to 90 seconds.These recordings were obtained from 126 different patients.There are a total of 5.5 hours of recordings covering 6898 respiratory cycles; 1864 include crackles, 886 include wheezes, and 506 have both.The data comprises recordings of both clean and loud respiratory sounds that imitate real-world settings.The data collection contains 423 cough sounds associated with pneumonia, 100 cough sounds associated with ATE, 92 cough sounds associated with COL, 42 cough sounds associated with edema, and 59 cough sounds associated with pneumothorax.At last, 393 cough sounds from LC patients were collected [133].Detailed statistics of the cough sound datasets are presented in Table 3.

Data Pre-processing
This section presents the process of converting cough sound into an image using the scalogram technique, the use of synthetic minority oversampling technique (SMOTE) Tomek to handle the imbalance class problem, and splitting the dataset for training, validation, and testing.

3.2.1.
Converting cough sound into an image using scalogram.The scalogram of a wave is the graphic representation of the real values of the coefficients that make up its Continuous Wavelet Transform (CWT) [134].In this investigation, the scalogram technique is used for both of the measurements.Initially, the 1-D cough sounds of several chest disease data undergo noise reduction processing.Second, CWT-based 2-D scalograms are applied to the preprocessed signals.As can be seen in Fig 3, the CWT transforms the data from the time domain to the frequency domain when it is applied to the cough sounds.When coupled with a bandpass filter, the noise-canceling technique known as convolution is an efficient tool for removing both high-and low-frequency noise (BPF).The CWT, which is very similar to the Fourier transform, is used to detect the degree of likeness between a wave and an examination function by utilizing the wave's internal products.Using the Eq (1), the CWT of the function T (S) at a scale (a > 0) is computed.The father signal, θ(S), is a continuous function in both the time domain and the frequency domain.a represents the continually shifting scale parameter values, while b represents the position parameter.The CWT coefficients yield a matrix of wavelets organized by scale and location.The father signal's job is to provide the children signals with the generation root feature that they need to function properly.In CWT, the cough sound signal is calculated by using the scale parameter in conjunction with the father signal [135,136].

Handling imbalanced class dataset.
When imbalanced datasets are supplied, one class will have the majority of the instances, while the other classes will only have a small number of instances among them.This results in an uneven distribution of classes and the incorrect categorization of examples belonging to minority groups since the classifier system tends to be biased and promotes cases belonging to the majority [137].It has been observed that (see Tables 2 & 3) most of the lung disease classes of the CXR, CT scan, and cough sound datasets are imbalanced.Since this is a problem, we use SMOTE Tomek to increase the number of images that are included in the dataset's classes that are related to minority lung disease.After applying the SMOTE Tomek approach, the total number of CSI, CXR scans, and CT scans that are associated with lung illnesses is shown in Table 4.
3.2.3.Image enhancement and pre-processing.The dataset includes images of eight distinct lung disorders as well as images of normal conditions.These images come in the form of CXRs, CT scans, and cough sounds.First, we took the CXR, CT scan, and CSI and converted them to grayscale by keeping the luminance information solely.Using the following Eqs (2 and 3), gave us the grayscale image set D 2 .notations G min (a) and G max (a), respectively.
where (p 1 , p 2 ) denotes the coordinates of a pixel in the CXR, CT scans, and CSI d 2 (a).Using Eq (6), d 3 (a) represents the new HTS image.
After that, we obtain the HTS image set Third, the CXR, CT scans, and CSI were cropped to remove the text and patient information before training the proposed model.Thus, we get the cropped lung disease dataset D 4 using Eqs (7 & 8).
where R represents the cropping process.The parameters z t , z b , z l , z r represents top, bottom, left, and right, respectively, values of the CXR, CT scans, and CSI.These parameters are used to crop the image in a unit of the pixel.Thus, we set  5, such as it can reduce storage space and a smallersize dataset can assist in preventing the proposed classification system from overfitting.The approach of trial and error is the justification for why we decided to set W 5 = H 5 = 299.We find that a lower size would make the images blurry, which will also result in a decline in the classifier's performance.On the other hand, greater size will result in overfitting, which will hamper the performance.Table 5 presents a comparison of the sizes and amounts of storage required by each image D s (a), s = 1,2, 3. ..., a = 1, 2. .., |D| at each stage of the preparation pipeline.After going through this preprocessing operation, we can observe that the storage cost for each image will be reduced to around 1.98% of what it was before.The compression ratio (CMR) rates of the a th image in its final state D 5 compared to its initial stage D 1 were computed as follows: CMR storage a ð Þ ¼ byte½d 5 ðaÞ� byte½d 1 ðaÞ� and CMR size a ð Þ ¼ size½d 5 ðaÞ� size½d 1 ðaÞ� .Hence, we can get CMR storage ðaÞ ¼ CMR size ðaÞ ¼ 1:98%; 8 a ¼ 1; 2; 3; . . .; jDj.

Proposed model
The conventional approaches to DL produced remarkable results in illness diagnosis [138,139].The CNN is an innovative form of artificial neural network.The proposed CNN is made up of convolutional layers (ConvLs), pooling layers (PLs), non-linear activation methods (NLAMs), and fully connected layers (FCLs).The primary function of the proposed CNN model is to convolute information.Convolution in two dimensions, in the width and height directions, is executed by ConvLs [140].It is important to note that proposed model weights start as random values and are later learned from the data itself through the process of network training.The proposed model takes three steps during a ConvLs operation: i) Kernel-based convolution (KBC); (ii) Stack; and (iii) NLAMs.The proposed model takes an input matrix I, kernels K p , 8p 2 [1,2, 3. .., P], and an output O, (here O refers to the result of the full threestep convolution layer, as opposed to the result of a single convolution).A layer's ability to conduct convolution is denoted by the presence of ConvLs, and the phrase "complete convolution layer" refers to the combination of ConvLs, a stack, and NLAMs.In addition, we have used the same color to symbolize both the input and the output of the ConvLs because the output would be utilized as the input for the ConvLs that come after it.
For each kernel K p , the results of the convolution are calculated by using Eq (11).
Where P refers to the operation on the stack.Finally, matrix D is input into the NLAM, which then produces the final matrix.Eq (13) is used to measure this final matrix.

Y ¼ NLAMðDÞ ð13Þ
As demonstrated in Eq (14), we can compute the respective sizes (Z), of the three primary components (input, kernel, and output).
whereas the three elements (W, H, and M) each reflect a different dimension of the matrix's size (width, height, and channels) [141].The subscripts I and K, respectively, are used to designate input and kernel, whereas the output is denoted by Y.The letter P stands for the total number of filters.It is important to note that M I = M K , which indicates that the channel of input M I should be equal to the channel of the kernel M K .The movement of these filters is determined by the padding of U p and the stride of U s .By applying Eq (15) [142], we can compute the dimensions (W Y , H Y , M Y ) of the output matrix Y as follows: where .denotes the floor function.The number of output channels M Y should correspond to the number of filters P. We used the rectified linear unit (ReLU) function in the last step, which is part of NLAMs [143].Let's say that f ab is an entry in the matrix D; in that case, we obtain (see Eqs (16 & 17)): ReLU is preferred over more traditional NLAMs such as the sigmoid function (SMF) and the hyperbolic tangent function (HTF).Eq (18) and Eq (19) are used to measure the SMF and HTF, respectively.
3.3.1.Improvement 1: Adding BANL and dropout to the proposed model.The motivation for developing the BANL came from a need to address the effect of randomness on the distribution of inputs to internal CNN layers while the network was being trained.The ICS refers to the influence of randomness on the distribution of inputs [144].The existence of ICS will result in a reduction in CNN's overall effectiveness [145].This study implemented BANL to normalize those internal layer's inputs I = {L a } during every mini-batch (let's assume its size is |I|), to ensure that the batch normalized output B = {b a } has a distribution that is uniform across the board.Eq (20) is express BANL function: fL a ; a ¼ 1; 2; 3; . . .:; jIjg |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } I I fB a ; a ¼ 1; 2; 3; . . .:; jIjg |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } In the process of training the model that has been suggested, Eq (21) and Eq (22) were utilized to determine the empirical mean M e and the empirical variance V e , respectively.
Eq (23) was used to input the value L a 2 I that was first transformed into the standard value L a .
L a ¼ L a À M e ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi where d s is the stability factor that is utilized to improve the numerical stability and is found in the denominator of Eq (23).At this point, L a has characteristics of zero mean and one standard deviation.Typically, Eq (24) is used to modify a CNN to make it more expressive [146].In this context, the term "expressive" refers to the network's expressive capacity, also known as its capability to express functions.b a ¼ P 1 � L a þ P 2 ; a ¼ 1; 2; 3; . . .:; jIj ð24Þ where the parameters P 1 and P 2 are two that can be learned throughout the training.After that, the transformed output b a 2 B is sent to the subsequent layer, while normalized L a , continues to exist inside the boundaries of the current layer.At this point in the process, we are no longer working with minibatch.Therefore, instead of computing M e and V e , we will compute the mean of the population, M p , and the variance of the population, V p , and then we will have the output, ba , at the inference stage according to Eq (25).
In contrast, a dropout layer (DPL) is added before the FCL.It is a strategy for regularisation that involves the arbitrary removal of neurons while the system is being trained.CNN models may be protected from being overfitted with the assistance of dropout.In the process of training, the study [147] presented the concept of dropout neurons (DPN) by randomly obliterating neurons and setting the weights of their neighbors to zero.Let's say the collection of all fully-connected neurons is denoted by the letter {R}, the collection of neurons that have been dropped by the {N}, and the collection of neurons that have been reserved by the letter {-}.The selections that are made by DPN are completed randomly, and the retention probability (L rp ), is determined by applying Eq (26) to the data.
Let's say we have a neuron with the coordinates N (a, b), and its initial weights are written out as w (a, b).During training, the weights w Z (a, b) of the neuron will be updated following Eq (27): During the process of inference, we run the CNN without using DPL; however, the weights of the FCLs w F (a,b) that employ DNs are downscaled by a factor of L rp , which is stated as a multiplier (see Eq (28)).
w F ða; bÞ ¼ L rp � wða; bÞ ð28Þ The value of the retention probability squared (L 2 rp ) is the compression ratio of learnable weights (CLW).Eq (29) is used to measure the CLW: where H represents the total number of learnable weights before DPL, and L R represents the total number of learnable weights after DPL.

Improvement 2:
Adding RBAP to the proposed model.The pooling function takes the output of a layer (particularly ConvLs) and replaces it with a summary statistic of the outputs of the layers that are nearby to the specific position.The pooling method can produce activations in the pooled map that are less sensitive to the precise placements of CXR, CT scan, and CSI structures than the activations produced by the original feature map.For resources from a region to be pooled P, that region's size must be between s × s, where s is the capacity of the pool.To measure the pixels contained inside region P = {P a,b }, {1 � a,b � s}, Eq (30) is utilized.

P
The N 2 norm pooling algorithm, abbreviated as N 2 P, is responsible for determining the N 2 norm of the region denoted by P. In the case when the output pooling matrix is O, we applied the N 2 P, to the O N 2 P output as O N 2 P ðPÞ ¼ sqrtð P s a;b¼1 P 2 a;b Þ.For this study, a constant value of 1 jPj has been added, where |P| denotes the total number of items present in the region P. Eq (31) shows that there is no change in either the training or the inference as a result of adding the new constant 1 jPj under the square root.
O N 2 P P ð Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P s a;b¼1 P 2 a;b Þ jPj s ð31Þ Eq (32) is utilized in the process of average pooling (AvgP), which determines the mean value of the region P.
The maximum value is chosen using the MPL (see Eq (33)), which operates on the region P.
O MPL ðPÞ ¼ maxðPÞ ¼ max s a;b¼1 P a;b ð33Þ Due to the following reasons, we have added the RBAP to the proposed model.
The study [148] presented three different rank-based pooling algorithms as possible solutions.The following are some of the advantages that these methods have over the more traditional methods of pooling data: (1) the ranking list is invariant to small changes in activation values; (2) significant activation values can be easily distinguished by their cognate ranks; and (3) the use of rank can circumvent scale problems that arise from value-based pooling.The RBAP is a rank-based pooling strategy that has been adopted in a wide variety of fields due to its superior performance compared to other approaches that are considered to be state-of-theart.RBAP was incorporated into CNN by Wang et al. [147] to detect cerebral microbleeds using susceptibility-weighted imaging.They succeeded in achieving a 97.18% precision rate.According to the findings of the study [140], which compared RBAP to traditional pooling methods, RBAP has the advantage of being able to simultaneously assign rankings and weights to activations, which is a significant benefit.
First, RBAP will determine the rank matrix (RM) from the values of the individual elements P L 2 P. As mentioned in Eq (34) the lower ranks R L [1,2,3,. ..., K 2 ] are allocated to higher values (P L ).
Eq (35) contains the tied values (P L1 = P L2 ) that are added to the constraint of Eq (34).
O RBAP (P) is the output of RBAP, and it uses the R T activations in their largest proportions as discussed in Eq (36).
where R T represents the rank threshold value.If (R T = 1), the RBAP value will be reduced to the MPL value.Instead, if (R T = K 2 ), RBAP will be converted into AvgP.As a result, RBAP is considered to be a compromise between the MPL and the AvgP strategies.It is important to note that N 2 P, AvgP, MPL, and RBAP all work on each slice independently.

Improvement 3:
Adding MWDG to the proposed model.Data augmentation (DAUG), data generation (DGEN), ensemble approaches (EAP), and regularisation (REG) are the four different sorts of solutions that may be used to evade the imbalance class chest illnesses dataset and the lack of generation (LGEN) problems.DAUG will produce counterfeit CXR, CT scans, and cough sound spectrogram images by altering previously collected data in some way, for as by cropping or rotating it.Data are generated by DGEN from a data source that is sampled.The SMOTE [149] algorithm is representative of DGEN in general.EAP approaches combine the results of numerous models to provide superior predictive performance compared to that of any single model [150].The weights of models are where REG focuses the majority of its attention.Assigning large weights will lead the CNN models to be unstable because even a small change in the inputs will result in significant shifts in the output.It is generally accepted that small weights are more common (or less specialized) than large ones.Because of this distinction, this method is called weight regularization (W-REG).Therefore, DAUG is utilized because of its simplicity and the ease with which it may be realized.
For this study, we suggested a method of MWDG (multiple-way data generation) represented as M DAUG .Our M DAUG differs from standard DAUG in that it makes use of several different DAUG methods (M DAUG > 10).Assume that the pre-processed dataset is called D 5 and its components are D 5 ¼ fd 5 ð1Þ; d 5 ð2Þ; d 5 ð3Þ; . . .; d 5 ðjDjÞg.The pre-processed chest diseases dataset is divided into three categories such as training (Z Train ), validation (Z Val ), and testing (Z Test ) as discussed in Eq (37).
where Z Train ¼ ½z • Rotation Skip over the value of 0 with the rotation angle vector F Rot .Eq (38) and Eq (39) are applied to perform the rotation on the Z Train portion.
Where Rot represents the rotation process, C n is the new image generated after applying the rotation method.
• Horizontal Shift Transform (HST) By using the HST (see Eq (40)), new images C n were produced.
where the word HST refers to the horizontal shift transform.HST factors F HST ignore the value of F HST = 0.In terms of mathematics, if the original coordinates are denoted by (P, Q), and the HST transformed coordinates are denoted by (P 1 , Q 1 ), then we get (see Eq ( 42)).
It is abundantly clear that the HST transform is a unique affine transform, and its formula may be expressed as Eq (43).
where VST denotes the vertical shift transform, which functioned in a manner analogous to the ST transform.To be more specific, the VST factor is identical to the HST factor 1; 2; 3; . . .; C n .

• Noise Injection (NI)
The Gaussian noises (GN) with an z G m -mean and an z G v -variance is used to apply the noise in the CXR, CT scan, and CSI.The Eq (46) is used to execute the process of NI.
where G represents the gray level of the images, and N stands for the probability density function.We can process the NI function by using Eqs (47 & 48).
where NI refers to the operation known as the noise injection.In this specific investigation, we made use of GN since, in comparison to impulse noise, speckle noise, and salt and pepper noise, it is the type that occurs in images the most frequently.
• Gamma Correction (GCOR) In the present work, we made use of GCOR to manage the level of brightness present across an image.The GCOR factor (F γ ) ignore the value of 1.Additionally, Eqs (49 & 50) are used to measure the GCOR.
• Random Translation (RTS) Every single one of the training images, denoted by the index {z Train (i)}, was given a translation of C n times with a random horizontal shift of H s and a random vertical shift of V s .The values for both of these parameters are within the range [-X 1 , X 1 ], and they follow a uniform distribution as mentioned in Eq (51).
where X 1 represents the greatest possible shift factor.So, Eqs (52 & 53) are used to process the RTS.
• Scaling The scaling factor F Scal was applied to each training image {z Train (i)}, except the image with F Scal = 1.The following Eq (54) and Eq (55) are used to execute the process of scaling the images.

Summary of proposed models
In total, we suggested the formation of four new models [P(1), P(2),. ..,P(4)].1).In P (1), the size of the input is 299 × 299 × 1, while the size of the output of the first ConvL is denoted as B 1 = 299 × 299 × 32.Then, the output is determined to be B 2 = 149 ×149 × 32 following the initial MPL_1.The output B 14 = 2 × 2 × 512 was obtained by repeating the ConvL process seven times.After this, the flattened layers were placed to convert the data into one column vector B 15 = 1× 1 × 2048.Two FCLs F1 and F2 were used after the flattened layers.The F1 layers contain the ReLU function and its output is denoted by B 16 = 1× 1 × 120.Furthermore, the F2 layer consists of the SoftMax function which is used to classify the data into their respective class.The output of the F2 layer is denoted by B 17 = 1× 1 × 9. Table 6 contains the provided values for the hyperparameters of P(1).
Based on P (1), we can construct the remaining three models.We upgraded the model P(1) by including a BANL layer and a dropout layer and designated it as P (2).Next, we constructed P (3) by substituting RBAP for the conventional MPL used in P (2).MWDG was suggested  and implemented for model P (3), which resulted in the creation of a new model P (4).A detailed description of these four proposed model are presented in Table 7.

Performance evaluation
Running the proposed model and other baseline models for R time helps to reduce the amount of unpredictability by running for R time.The ideal I M and actual A M confusion matrices over the validation set are calculated for each run with r = 1,2, 3. ..., R. The following Eq (56) is used to measure the validation set.
� c 2 ðrÞ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The FMI can be characterized by its m 1 (r) and m 3 (r), respectively as mentioned in Eq (65).m 7 ðrÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi m 1 ðrÞ � m 3 ðrÞ p We can calculate the mean (ME) and standard deviation (SD) of all ith (8i 2 [1,7]) measures after capturing the seven indicators that are present in R times as discussed in Eqs (58)(59)(60)(61)(62)(63)(64).
The Eq (66) and Eq (67) are used to measure the ME and SD.
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The final result, which was compiled from R separate runs, was presented in the form of ME ± SD.

Proposed algorithm
The pseudocode for the proposed model is presented in Algorithm 1, and it consists of input (I input ), output (O output ), and five sections [S 1 , S 2 , S 3 , S 4 , S 5 ] The preprocessing of CXR, CT scan, and CSI are demonstrated in S 1 .The developing steps for each of the four proposed models [P(1), P(2), P(3), P(4)] are presented in S 2 .The R runs over the validation set are broken down into their parts which are discussed in S 3 .In S 4 , we provide the methodology for choosing the most effective proposed model from [P(1), P(2), P(3), P( 4)] based on validation results.

Experimental setup
The model that was proposed was constructed with open-source TensorFlow (TF) [151] version (v) 2.12.0, whereas the seven DL models i.e., Vgg-19, ResNet-101, ResNet-50, DenseNet-121, EfficientNetB0, DenseNet-201, and inception-V3 are implemented with TF version (v) 1.8.Additionally, the Keras library was leveraged as the backbone for each of their implementations.Python language [152] is also employed in the construction of processes that are unrelated to the creation of convolutional networks.The experiment was carried out on a workstation that utilized the Windows operating system and featured 32 gigabytes of RAM in addition to an 11-gigabyte graphics processing unit (GPU) from NVIDIA.The source code and detailed descrption of the dataset is presentation in supporting information S1 File.

Proposed models hyperparameters settings
The hyperparameter settings that were utilized for this research are presented in Table 8.The great majority of values are arrived at by way of experimentation and exploration.The pooling is configured to have a size of 2. The number of DAUG methods has been set to 7, and the number of new CXR, CT scans, and CSI that will be used for each DAUG has been set to 14.The number of runs R for the proposed model and seven baseline models is set at 5, as this is a default value that is frequently utilized in a variety of studies [5,10,15,24,[28][29][30].

Visualization of the images after using MWDG
The results of the MWDG are shown in Fig 6 . Figs 2, and 3 show the image as it appeared in its original form.It has been observed that a single image of CXR, CT scan, and CSI can produce 09 extra images.Because of this, our approach is known as multiple-way data augmentation MWDG.https://doi.org/10.1371/journal.pone.0296352.g006

AU (ROC) of the proposed models and baseline models
The true positive rate (TPR) and the false positive rate (FPR) are displayed against one another on a receiver operating characteristic (ROC) curve.The greater the value of the area under the curve (AUC) of ROC, the greater the degree to which the model is considered to be effective for chest disease diagnosis.The class-wise evaluation of the proposed P(4) model compared to the baseline models is represented by the AU(ROC), which can be seen in

GRAD-CAM visualization of proposed model
We employed a technique called Gradient-weighted Class Activation Mapping (Grad-CAM) [153] to graphically demonstrate the reasons why the suggested P(4) model can conclude.By examining the gradient of the classification score concerning the convolutional features that were formed by the network, Grad-CAM can determine which components of an image are most important for classification.On the heat map, the pseudo-color known as "jet" was utilized [154][155][156][157]. Therefore, regions that are vital for AI diagnosis are represented by red colors, whereas areas that are not necessary for AI diagnosis are represented by blue colors.

Ablation study
This study enhances the four models that were suggested, designated P(1) through P(4), by including the BANL, RBAP, ICS, and MWDG methodologies, in that order.We used the control variable technique to statistically analyze experimental data while concurrently manipulating a variable to evaluate if the updated proposed model is relevant to nine different chest diseases.During this study, the accuracy of each model in categorizing nine distinct chest diseases was evaluated and compared with the assistance of metrics to establish the significance of the enhanced module to the model.In Experiment 1, the original P(1) model is demonstrated;  11.
When the findings of Experiment 1 and Experiment 2 are compared, it is evident that the addition of the BANL and dropout layer into the BM leads to an improvement of 3.28% in the model's average classification accuracy (m 4 ) of chest disorders.This demonstrates that BANL makes the training of suggested models more stable while also speeding it up.Additionally, it normalizes the input to a layer, making certain that each mini-batch has a distribution that is comparable to the others.This helps to avoid problems such as ICS and enables more stable and faster convergence while training the proposed model.As a result, the quality of the feature mapping is improved, and the model's overall accuracy is increased by a large amount.Experiment 1 and Experiment 3 results show that the model is responsible for a 6.67 percentage point improvement in classification accuracy.This demonstrates that switching from the MPL to the RBAP improves the accuracy of the model while maintaining the same perceptual field.When Experiment 1 and Experiment 4 are compared to one another, the model's average recognition accuracy demonstrates an increase of 8.79%.This suggests that the proposed P(4) model, which combines the BANL, RBAP, and MWDG in exchange for higher classification accuracy, is more effective than other models in terms of the overall performance of the model as a whole.

Comparison of proposed model with state-of-the-arts
This section presents the classification performance of the proposed P(4) model with recent state-of-the-art (SOTA) models in terms of many performance evaluation metrics as shown in Table 12.

Discussions
The term chest disease is used to represent a wide variety of medical conditions that affect the thoracic region, which includes organs that are needed for breathing and circulation [12][13][14][15][16].  PNEU leads to inflammation in the air sacs of the lungs and is typically brought on by bacteria [5], viruses [6], or fungi [17,[25][26][27][28][29][30][31][32].It can be fatal if left untreated.Infection with COVID-19 [28] primarily affects the respiratory system, leading to lung inflammation and damage [44].Acute respiratory distress syndrome (ARDS) [46] is characterized by severe difficulty in breathing and can sometimes develop in extreme settings [71].Both acute and chronic TB [83] are caused by inflammation of the bronchial tubes and have the same restrictive impact on breathing [92].Individuals with PNEUTH have trouble breathing because their bronchial airways have become inflamed and narrowed [91].Patients with LC often have difficulty breathing, which is a sign of the condition.As was previously mentioned, various medical imaging modalities such as CXR, CT scans, and CSI were utilized by several researchers [83-94, 97, 99] to identify chest disorders.Diagnostic imaging has become increasingly important in the treatment of chest conditions.An essential diagnostic tool, CXR [157][158][159], and CT scans [160] provide a fast and easily accessible overview of the chest's internal structures like the heart, lungs, ribs, and diaphragm.Moreover, few studies [137][138][139][140][141][142] used cough sounds for the identification of several chest diseases.This study proposed four models P(1), P(2), P(3), and P(4) used for the classification of nine different chest diseases using CXR, CT scans, and CSI.P(1) is our base model, which has its foundations in CNN.Furthermore, we have upgraded the P(1) model to the P(2) model and added BANL and dropout layer into it.We have enhanced the performance of the model by adding these layers in the proposed model P(2), which can be viewed in Table 10.Moreover, by adding BANL, the proposed model P(2) became stable at the time of training.The ICS problem has also been resolved by adding BANL.Afterward, we proposed the P(3) model in which we replaced the MPL with RBAP.The purpose of replacing these layers is to maintain the relationship between pixel values of the CXR, CT Scans, and CSI.Finally, we add MWDG with P(3) named P(4) to generate a synthetic image at the time of training the model.The P(4) achieved the highest accuracy of 99.01% as compared to the P(1) to P(3) models (see Table 8 9.It has been observed from Table 9 that the proposed P(4) model attains the highest classification outcomes in terms of performance evaluation metrics such as m 1 to m 7 when compared with the baseline models.In this study, the P(4) model's accuracy in classification is measured against that of SOTA classifiers as shown in Table 12.Using CXR, CT scans, and CSI, the P(4) model can detect COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, NOR, and PNEU.This has added considerable output in assisting the clinical expert, as evidenced by comparing experimental findings with modern SOTA approaches.Constantinou et al. [90] designed a ResNet-101 model for the classification of COVID-19 and healthy cases by using CXR.They achieved remarkable results in terms of accuracy (m 4 ) of 96.99%.Duong et al. [92] developed a DCNN model by using CT scan images for the classification of COVID-19, non-COVID-19, and PNEU cases.They attained the m 4 of 96.60%.The study [97] proposed a Vgg-16 model and significantly classifies chest diseases such as COVID-19, non-COVID-19, and PNEU.Cohen et al. [155] suggested a CNN-based model named CSSNet for the classification of COVID-19 and normal cases.Their classification accuracy result was 91.02%.In the study [157], they designed a COVNet model for the identification of three chest diseases i.e., COVID-19, PNEU, and non-COVID-19 cases by using CXR images.The COVNet model produced an appropriate result of 90.33% accuracy.Loey et al. [156] proposed the GGNet model used for the classification of COVID-19 cases using CXR images.The study [101], used the ResNet-18 model for the classification of COVID-19 cases using cough sounds.
Tables 9-12 demonstrate that the suggested P(4) model is more capable of diagnosing anomalies and extracting the dominant and discriminative patterns in imaging data, such as CXR, CT scans, and cough sound samples, with a m 4 result of 99.01%.Table 10 also includes the results of seven additional CNN-based pre-trained classifiers, and our in-depth investigation into the origins of COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, NOR, and PNEU utilizing CXRs, CT scans, and CSI explains the lower classification performance observed in the prior art.The classification performance of the CNN-based pre-trained deep networks has been hindered by the initial step of the process, which consisted of the deep networks being reduced to their final ConvLs.These pre-trained classifiers also have an inadequate filter size because the number of neurons connected to the input is so huge that the major components are completely ignored.The P(4) model that has been developed offers a solution to these problems.This research established an end-to-end CNN-based P(4) model in conjunction with the BANL, RBAP, and MWDG to diagnose numerous chest conditions utilizing CXRs, CT scans, and CSI.Low resolution and overlaps are no longer an issue in the inflammatory section of CXR and CT scans according to the P(4) model.In addition to improving classification performance and speeding up convergence, this approach significantly mitigates the negative effects of structured noise.Using CXR, CT scans, and CSI, the P(4) model was able to correctly classify COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, NOR, and PNEU.The results demonstrate that this method has been of considerable benefit to medical experts.

Conclusion and future work
This paper proposed a total of four network models for the classification of nine different chest diseases such as COVID-19, LC, ATE, COL, TB, PNEUTH, EDE, NOR, and PNEU using CXRs, CT scans, and CSI.According to the findings of the experiments, model P(4) has the potential to attain the highest level of performance compared to the other proposed models (P (1) through P(3)) and the seven baseline models.In addition to this, the P(4) model produces results that are superior to those produced by other SOTA methods.The suggested P(4) model has the best performance because (i) it can learn individual CXR, CT scans, and CSI-level representations, and (ii) the proposed P(4) model is a novel DL model trained with its structure constructed and weights generated from scratch.Both of these aspects contribute to the model's ability to learn.In addition, P(4) made use of several more complex methods, including MWDG, BANL, dropout, and RBAP.CSI, CT scans, and CXRs are the only ones that this P(4) model can handle.The proposed P(4) model achieves the highest classification results of 99.01% as compared to the baseline models and SOTA.Additionally, the ablation study has been performed to observe the effectiveness of the proposed model.The shortcomings of this proposed P(4) is that it will not function appropriately when applied to sonography and MRI images.In the future, we integrate federated learning and blockchain technology with the proposed model to ensure patient data privacy.
Fig 1 depicts the suggested framework of the present study.

Fig 1 .
Fig 1. CSI, CXR, and CT scans are the three diagnostic tools that are indicated for use in the process of identifying a variety of chest disorders.https://doi.org/10.1371/journal.pone.0296352.g001 . The dataset contains a total of 2123 images including 500 images of EDE, 400 images of ATE, 500 images of COL, and 723 images of PNEUTH.A sample image of COVID-19 and other chest disorders CT scans and CXR is depicted in Fig 2. Table 2 describes the detailed summary of the CXR and CT scan images used for the classification of several chest diseases.
we can have W 4 = H 4 = 400 and M 4 = M 2 = 1.Fourth, we reduced the size of each image such that it was [W 5 , H 5 ] pixels, and now we can get the downsized image set D 5 by using Eqs (9 & 10).D 5 ¼# ðD 4 ; ½W 5 ; H 5 �Þ ð9Þ D 5 ¼ fd 5 ð1Þ; d 5 ð2Þ; d 5 ð3Þ; . . .::; d 5 ðaÞ; . . .; d 5 ðjDjÞg ð10Þ where #: O!I means the downsampling (DS) function.The parameter I is a downsampled CXR, CT scan, and CSI of original image O.For the present work, the images were downsampled to the fixed size of resolution, W 5 = H 5 = 299, M 5 = 1.There are several advantages to DS, some of which are indicated in Table Fig 5 represents the graphical representation of these four proposed models.First, we designed a CNN-based model named P (1).P (1) is called the base model (BM) of this study.Fig 5 represents the activation maps of the proposed BM in the P (

Fig 7
Fig 7 shows the training-validation loss and accuracy concerning epochs.The proposed P(4) model and seven baseline models were executed up to 50 epochs.The Vgg-19 model attained the accuracy for training was 0.981 and that for validation was 0.980.Similarly, the Vgg-19 model produces 0.21 of training loss and 0.25 of validation loss.The ResNet-101 and ResNet-50 achieved training accuracy of 0.929 and 0.971 respectively.The DenseNet-121, Inception-V3, and EfficientNetB0 achieve training accuracy of 0.929, 0.969, and 0.960, respectively.The proposed P(4) model achieves the maximum accuracy for training was 0.988 and that for validation was 0.973.Additionally, the training and validation loss of the proposed P(4) model is 0.100 and 0.101, respectively.The training-validation accuracy and loss value indicate that the proposed P(4) model trained well on the data used for the classification of nine different chest diseases.The detailed results of the proposed P(4) model and baseline models are presented in Fig 7.
Fig 10  presents  the results of the Grad-CAM heat map for nine distinct chest disorders, utilizing CXR, CT scan, and CSI respectively.In Fig10, the red effect was used to demarcate the infected area, which is where the base glass opacity (BGO)[158][159][160] is visible to us.It is clear from this that the AI is concentrating its efforts on the BGO infection, which suggests that it has successfully captured the BGO lesions.Second, the tracheae are given some attention by AI.The grayscale values of the tracheae tissues shown in the center of Fig 10 may have been altered by COVID-19, causing the yellow effects.

Fig 10 .
Fig 10.GRAD-CAM visualization of the proposed model for highlighting the infected region of nine chest diseases.https://doi.org/10.1371/journal.pone.0296352.g010 ).Additionally, Fig 10, GRAD-CAM visualization of the proposed P(4) model which highlights the infected region of the lungs.The results of the proposed models are also compared with several baseline models i.e., Vgg-19., ResNet-101, ResNet-50, DenseNet-121, EfficientNetB0, DenseNet-201, and Inception-V3 as shown in Table 1.A novel CNN-based model has been designed to classify the eight different chest diseases by using CXR, CT scans, and CSI.
2. The novel proposed model was designed by replacing the traditional MPL with RBAP, and BAN was added to solve ICS.Additionally, MWDG techniques were used to prevent the model from overfitting at the time of training.

Table 1 . Recent literature on chest disease identification using the DL model.
https://doi.org/10.1371/journal.pone.0296352.t001 [43]h et al. [57]used healthy CT images of patients to fine-tune well-known TL classifiers, such as MobileNet-v2.This allowed them to identify COVID-19 more accurately.The MobileNet-v2 model was able to attain accuracy in the categorization of 96.40 percent.An innovative method that produces a global model through the utilization of blockchain-based federated learning (FL) was presented by Malik et al.[43].This system collects data from five separate databases (different hospitals) and aggregates it.FL trains the model on a global scale while maintaining the hospitals' right to privacy by utilizing blockchain technology (BCT) to authenticate the data.The suggested framework was split into three sections.The initial step in dealing with the diverse collection of data that was obtained from five separate sources by using several different CT scanners was to normalize the data.
After that, COVID-19 patients were classified using CapsNet in conjunction with IELMs.Lastly, training a global model while retaining anonymity using BCT and FL.They maintained patient confidentiality while classifying COVID-19 cases with an accuracy rating of 98.99%.Using CT scans, Kogilavani et al.

Table 5 . CXR, CT scan, and CSI size and storage at each preprocessing step.
https://doi.org/10.1371/journal.pone.0296352.t005 Train ð1Þ; z Train ð2Þ; . . .; z Train ðiÞ; . . .; z train ðjZ Train jÞ� represents the training portion of the dataset, Z Val ¼ ½z Val ð1Þ; z Val ð2Þ; . . .; z Val ðiÞ; . . .; z Val ðjZ Val jÞ� and Z Test ¼ ½z Test ð1Þ; z Test ð2Þ; . . .; z Test ðiÞ; . . .; z Test ðjZ Test jÞ� denotes the validation and testing portion of the dataset, respectively.The total size of the Z Train , Z Val , and Z Test is equal to the size of the preprocessed dataset i.e., jZ Train j þ jZ Val j þ jZ Test j ¼ jd 5 j.The entire Z Train image collection was analyzed using seven different DAUG approaches, each of which had a different MWDG factor F applied to it.Additionally, each MWDG method will result in the creation of C n additional images.Let's say the output MWDG of Z Train is represented as Z TrainD ¼ fz TrainD ðiÞg.

Table 10 . Results comparison of the proposed model with other baseline models.
92.98 ± 2.09, m 3 = 92.99 ± 2.11, m 4 = 92.97± 2.03, m 5 = 92.97± 2.13, m 6 = 92.97± 2.17 https://doi.org/10.1371/journal.pone.0296352.t010= consolidated dataset of 33,579 CXR, CT scans, and CSI including 3716 COVID-19 images, 3727 LC images, 3753 ATE images, 3752 COL images, 3784 TB images, 3704 PNEUTH images, 3707 EDE images, 3714 PNEU images, and 3722 NOR images.In the confusion matrix, the actual cases were placed along rows and predicted cases were placed along columns.The Vgg-19 correctly classifies the 3661 cases of COVID-19 and misclassified 1 case as LC, 8 cases as ATE, 8 cases as COL, 17 cases as TB, 2 cases as PNEUTH, 9 cases as EDE, 2 cases as PNEU, and 8 cases as NOR.The ResNet-101 correctly classified the 3654 cases as COVID-19 and incorrectly classified 15 cases as ATE, 11 cases as PNEU, 11 cases as NOR, and 21 cases as EDE.The ResNet-50 accurately classifies the 3638 cases of COVID-19.Additionally, 3618 cases of COVID-19 are accurately classified by the DenseNet-121.The Efficient-NetB0 correctly classifies the 3529 cases of COVID-19 and misclassified 17 cases as LC, 33 cases as ATE, 9 cases as COL, 33 cases as TB, 13 cases as EDE, 33 cases as PNEU, and 16 cases as NOR.Furthermore, DenseNet-201 correctly classifies the 3529 cases of COVID-19 and misclassified 1 case as LC, 6 cases as ATE, 13 cases as COL, 1 case as TB, 2 cases as PNEUTH, 10 cases as EDE, and 10 cases as NOR.The proposed P(4) model produced significant results as compared to other models and accurately classified 3895 cases of COVID-19, 3869 cases of LC, 3870 cases of ATE, 3901 cases of COL, 3911 cases of TB, 3893 cases of PNEUTH, 3916 cases of EDE, 3899 cases of PNEU, and 3906 cases as NOR.The detailed results are presented in the Fig 8.